Audio-visual quality as combination of unimodal qualities: environmental effects on talking heads

نویسندگان

Benjamin Weiss

Christine Kühnel

Sebastian Möller

چکیده

Introduction Talking heads provide a multimodal output component for human-computer-interfaces. They consist of facial visual models that are synchronized with speech synthesis modules concerning speech articulation. Due to their reduction to a human head or upper body, articulation is often more clearly visible compared to a full human body due to the possibly bigger display of the head. Therefore, talking heads are especially suited for applications like robust speech understanding and language acquisition. Evaluation is typically concerned with function test to assess the synthesis quality with e.g. metrics like word error rate of human listeners or perceived naturalness (cf. [8]). But as more and more talking heads are used as interfaces for speech-based dialogue systems and are enhanced with facial expressions, the overall quality experienced by the user is in scope.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality of talking heads in different interaction and media contexts

We investigate the impact of three different factors on the quality of talking heads as metaphors of a spoken dialogue system in the smart home domain. The main focus lies on the effect of voice and head characteristics on audio and video quality, as well as overall quality. Furthermore, the influence of interactivity and of media context on user perception is analysed. For this purpose two sub...

متن کامل

Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence

In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experim...

متن کامل

Investigating Communicative Feedback Phenomena across Languages and Modalities

This thesis deals with human communicative behaviour related to feedback, analysed across languages (Italian and Swedish), modalities (auditory versus visual) and different communicative situations (human-human versus human-machine dialogues). The aim of this study is to give more insight into how humans use communicative behaviour related to feedback and at the same time to suggest a method to...

متن کامل

Multimodal Speech Synthesis

Multimodal Speech Synthesis (’<Talking Heads”) encompasses synthesis of speech from text (“Text-toSpeech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“Visual TTS”, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthr...

متن کامل

A comparison of German talking heads in a smart home environment

The authors describe a newly developed German Text-Toaudiovisual-Speech (TTavS) synthesis system based on the English speaking HeadZero. Targets of the control parameters of the talking head are generated by mapping of German phonemes to the originally English visemic blend shapes controls. The resulting German version of HeadZero and the German talking head MASSY were extended to generate audi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Audio-visual quality as combination of unimodal qualities: environmental effects on talking heads

نویسندگان

چکیده

منابع مشابه

Quality of talking heads in different interaction and media contexts

Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence

Investigating Communicative Feedback Phenomena across Languages and Modalities

Multimodal Speech Synthesis

A comparison of German talking heads in a smart home environment

عنوان ژورنال:

اشتراک گذاری